| simulation | reef_number | year_number |
|---|---|---|
| High-resolution | 25 | 15 |
| Medium-resolution | 15 | 15 |
| Low-resolution | 5 | 15 |
| Temporally-sparse | 50 | 2-15 |
To be displayed on a big screen.
Introduction to synthos
The R package synthos GitHub generates synthetic data with the primarly aim to test new methods and sampling designs, in a controlled enviroment. The data are created from spatio-temporal dependency structures mimicing realistic baselines, population dynamics, disturbance regimes and stochastic processes.
The generation of synthetic data is based on four main steps:
- Generation of the spatio-temporal domains: the first step consists to generate virtual coral reefs across a spatial domain.
- Generation of the disturbances: three disturbance types are generated (Heat Stress events, Cyclones and “Other”) using a mechanistic approach that generates different spatial and temporal footprints across the spatial domain.
- Generation of baselines: representing values in coral cover the year prior the first sampling.
- Generation of sampling designs: the user selects the number of monitoring locations and surveyed years and additional spatial scales.
Coral cover values are generated by combining baseline values, disturbance effects, and growth rates, and then projecting their cumulative effects across the spatial domain. The sampling design specifies the number of observations and years to be selected within a domain, creating the observed dataset. Models are then fitted to these observed data, and their predictive performance is evaluated at locations that are not monitored but for which true values are known.
Synthos modelling pipeline
We developed a modelling pipeline to test the predictive capaiblities of the spatio-temporal FRK model under different sampling designs. Four scenarios are explored with varying number of monitored reefs and years (Table 1).
Other settings were kept constant across the simulation scenarios. The coral baseline was generated using a west–east gradient of coral cover, consistent with patterns observed on the Great Barrier Reef. The relative influence of disturbances on coral dynamics was set to 60% for heat stress events, 39% for cyclones, and 1% for other factors. Coral growth was fixed at 30%, with a possible coverage range from 1% to 70%. The sampling design emulates the monitoring program of the Australian Institute of Marine Science, where data are collected within each reef from 2 sites, 5 transects per site, 100 photo frames per transect. Finally, point-based observations are generated to mimic the outputs of the machine learning system used in the ReefCloud platform, with 50 points per frame automatically classified.
The spatial domain is divided into a 0.1° tessellation grid, from which tier-level disturbance values are extracted and coral cover values are averaged for use in the spatio-temporal model. The predictive performances of the spatio-temporal for each scenario is tested using four predictive measures (see details below). These metrics are calculated using model predictions at predictive-tiers where true values are known.
Additional results of synthetic data vizualisation, predictions across multiple spatial scales, uncertainty, model goodness-of-fit, and disturbances are presented below.
Monitoring locations
[[1]]
[[1]]
[[1]]
[[1]]
Data visualization
[[1]]
[[1]]
[[1]]
[[1]]
[[1]]
[[1]]
[[1]]
[[1]]
Regional trends
[[1]]
[[1]]
[[1]]
[[1]]
Model prediction
[[1]]
[[1]]
[[1]]
[[1]]
Model uncertainty
[[1]]
[[1]]
[[1]]
[[1]]
Trends at data-tiers
[[1]]
[[1]]
[[1]]
[[1]]
Trends at predictive-tiers
[[1]]
[[1]]
[[1]]
[[1]]
Model attribution
[[1]]
[[1]]
[[1]]
[[1]]
Model fit
Cyclone
[[1]]
[[1]]
[[1]]
[[1]]
fig_cyclone[1] “High-resolution” “Low-resolution” “Medium-resolution” [4] “Temporally-sparse”
Heat stress
[[1]]
[[1]]
[[1]]
[[1]]
Details on predictive measures
These predictive measures give a single number with low scores representing better performances.
- 95% coverage interval (CvgErr): evaluates how often predictions include true observations, with the goal of capturing the true values 95% of the time. It is estimated as follows:
\[ \text{CvgErr}(z, \ell, u) \;=\; \left| 0.95 \;-\; \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\!\left( \ell_i < z_i < u_i \right) \right| \]
where \(z = \{z_1, z_2, \dots, z_n\}\) are the coral cover observations, \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(\mathbf{1}(\cdot)\) is the indicator function (1 if the condition is true, 0 otherwise).
- 95% interval score (IS): rewards prediction intervals that include the true observations (accuracy) and penalizes those that are too narrow or too wide (precision). It is computed as follow:
\[ \text{IS}_{95} \;=\; \frac{1}{n} \sum_{i=1}^{n} \Bigg[ (u_i - \ell_i) + \frac{2}{\alpha} (\ell_i - y_i)\,\mathbf{1}(y_i < \ell_i) + \frac{2}{\alpha} (y_i - u_i)\,\mathbf{1}(y_i > u_i) \Bigg] \]
where \(\alpha = 0.05\), \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(y\) are observed coral cover.
- Root-mean-squared prediction error (RMSPE) - how far off model predictions are from true observations without considering for uncertainty.
\[ \text{RMSPE} \;=\; \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } \]
where \(y\) and \(\hat{y}\) are the observed and predicted coral cover values, respectively, and \(n\) the total number of observations.
- Continuous Ranked Probability Score (CRPS) - represents the quality of the predictions over the entire predictive probability distribution penalizing predictions that are inaccurate, imprecise or overconfident.
\[ \text{CRPS}(F, y) \;=\; \sigma \left[ z \left( 2 \Phi(z) - 1 \right) \;+\; 2 \,\phi(z) \;-\; \frac{1}{\sqrt{\pi}} \right], \quad z = \frac{y - \mu}{\sigma} \]
where \(y\) is the observed coral cover values, \(\mu\) and \(\sigma\) are the mean and the standard deviation of the predictive normal distribution, \(\phi(.)\) represented the standard normal probability density function and \(\Phi\) the cumulative distribution function.